-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(dot/peerset): remove race conditions from peerset
package
#2267
Conversation
Codecov Report
@@ Coverage Diff @@
## development #2267 +/- ##
===============================================
- Coverage 61.32% 61.26% -0.07%
===============================================
Files 216 216
Lines 28403 28493 +90
===============================================
+ Hits 17418 17455 +37
- Misses 9198 9245 +47
- Partials 1787 1793 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
ff902fc
to
f0c0e10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!!
For most things I only have minor comments apart from peerset_test.go
.
Tests is peerset_test.go
do what they are supposed to but what they are testing is not obvious. It can probably be better structured.
Once that is fixed, I think this is good to go!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the diff of this PR could've been a lot smaller, but I liked how you wrapped a lot of the errors and made the code easier to read.
dot/peerset/peerset.go
Outdated
// default channel size for peerSet. | ||
|
||
msgChanSize = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit out of scope, but do we actually need a buffer channel??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have both actionQueue
and resultMsgCh
as buffered channels. In fact, I believe, we don't need this msgChanSize
, should I open an issue to turn them into unbuffered channels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe open an issue to investigate all buffered channels requirement? You can add that one as a first bullet point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much of the code is variable renaming and restructuring of code, which sort of hides the actual problem that you have fixed.
I have commented when I am not on board with changes. I would say, we should not restyle unless it is obviously necessary.
But, mostly I am requesting changes for some of the channel related comments I have made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits here and there 👍
dot/peerset/peerset.go
Outdated
if lastDiscoveredTime.Add(forgetAfterTime).Second() >= currTime.Second() { | ||
continue | ||
} | ||
|
||
// forget peer removes the peer from the list of members of the set. | ||
err = ps.peerState.forgetPeer(set, peerID) | ||
if err != nil { | ||
return err | ||
return fmt.Errorf("cannot forget peer %s from set %d: %w", peerID, set, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EclesioMeloJunior I don't think this is done for that particular line 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few additional nit changes to do 🙏
But otherwise ready to merge 🚀
@@ -318,7 +318,7 @@ func (ps *PeerSet) updateTime() error { | |||
// forget peer removes the peer from the list of members of the set. | |||
err = ps.peerState.forgetPeer(set, peerID) | |||
if err != nil { | |||
return fmt.Errorf("cannot forget peer %s from set %d: %w", peerID, set, err) | |||
return fmt.Errorf("cannot forget peer: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit add the set context in the forgetPeer
error fmt.Errorf("%w: for peer id %s", ErrPeerDoesNotExist, peerID)
@@ -340,7 +340,7 @@ func (ps *PeerSet) reportPeer(change ReputationChange, peers ...peer.ID) error { | |||
for _, pid := range peers { | |||
rep, err := ps.peerState.addReputation(pid, change) | |||
if err != nil { | |||
return fmt.Errorf("cannot add reputation (%s) to peer %s: %w", change, pid, err) | |||
return fmt.Errorf("cannot add reputation: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit add the change
to the error context in addReputation
: fmt.Errorf("%w: for peer id %s", ErrPeerDoesNotExist, peerID)
@@ -356,7 +356,7 @@ func (ps *PeerSet) reportPeer(change ReputationChange, peers ...peer.ID) error { | |||
// disconnect peer | |||
err = ps.peerState.disconnect(i, pid) | |||
if err != nil { | |||
return fmt.Errorf("cannot disconnect peer %s at set %d: %w", pid, i, err) | |||
return fmt.Errorf("cannot disconnect: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit add the set context to the error wrapping in fmt.Errorf("%w: for peer id %s", ErrPeerDoesNotExist, peerID)
in the disconnect method
@@ -402,7 +402,7 @@ func (ps *PeerSet) allocSlots(setIdx int) error { | |||
} | |||
|
|||
if err = peerState.tryOutgoing(setIdx, reservePeer); err != nil { | |||
return fmt.Errorf("cannot set peer %s from set %d as outgoing: %w", reservePeer, setIdx, err) | |||
return fmt.Errorf("cannot set as outgoing: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit add the setidx context to the error wrapping in tryOutgoing
# [0.7.0](v0.6.0...v0.7.0) (2022-11-23) ### Bug Fixes * **chain:** update ed25519 addresses in dev/gssmr genesis files ([#2225](#2225)) ([5f47d8b](5f47d8b)) * **ci:** caching of Go caches ([#2451](#2451)) ([ce3c10c](ce3c10c)) * **ci:** codecov.yml configuration ([#2698](#2698)) ([d4fc383](d4fc383)) * **ci:** comment skip code for required workflows ([#2312](#2312)) ([45dce9b](45dce9b)) * **ci:** copyright workflow to exit if different files ([#2487](#2487)) ([89c32ae](89c32ae)) * **ci:** deepsource toml configuration ([#2744](#2744)) ([86a70de](86a70de)) * **ci:** embed v0.9.20 runtime, update test suite, and ci workflows ([#2543](#2543)) ([0fff418](0fff418)), closes [#2419](#2419) [#2561](#2561) [#2572](#2572) [#2581](#2581) [#2671](#2671) * **ci:** fix staging Dockerfile ([#2474](#2474)) ([ae04b80](ae04b80)) * **ci:** mocks checking fixes ([#2274](#2274)) ([d1308e0](d1308e0)) * **ci:** run devnet module unit tests ([#2756](#2756)) ([f635c59](f635c59)) * **ci:** run golangci-lint on integration tests ([#2275](#2275)) ([3ae3401](3ae3401)) * **cmd:** allow --genesis flag to be passed to base command ([#2427](#2427)) ([7f5b5aa](7f5b5aa)) * **cmd:** avoid nil pointer dereference ([#2578](#2578)) ([f2cdfea](f2cdfea)) * **config:** temporary fix for pprof enabled setting precedence ([#2786](#2786)) ([d4d6262](d4d6262)) * **core:** fix txn pool for latest runtime ([#2809](#2809)) ([1551e66](1551e66)) * **deps:** upgrade chaindb to remove badger logs ([#2738](#2738)) ([e0c5706](e0c5706)) * **devnet:** Fix build workflow for devnet ([#2125](#2125)) ([0375fc2](0375fc2)) * **Dockerfile:** remove script entrypoint ([#2707](#2707)) ([abd161b](abd161b)) * **dot/core:** `RuntimeInstance` interface `Version` signature ([#2783](#2783)) ([7d66ec0](7d66ec0)) * **dot/core:** fix the race condition in TrieState ([#2499](#2499)) ([804069c](804069c)), closes [#2402](#2402) * **dot/digest:** BABE NextEpochData and NextConfigData should be set on finalization ([#2339](#2339)) ([e991cc8](e991cc8)) * **dot/digest:** verify if next epoch already contains some definition ([#2472](#2472)) ([a2ac6c2](a2ac6c2)) * **dot/netwok:** check for duplicate message earlier ([#2435](#2435)) ([d62503f](d62503f)) * **dot/network:** change BlockRequestMessage number from uint64 to uint32 ([8105cd4](8105cd4)) * **dot/network:** close notifications streams ([#2093](#2093)) ([de6e7c9](de6e7c9)), closes [#2046](#2046) * **dot/network:** fixing errMissingHandshakeMutex ([#2303](#2303)) ([eb07a53](eb07a53)) * **dot/network:** memory improvement for network buffers ([#2233](#2233)) ([fd9b70d](fd9b70d)) * **dot/network:** public IP address logging ([#2140](#2140)) ([9e21587](9e21587)) * **dot/network:** re-add nil mutex check for disconnected peer ([#2408](#2408)) ([9b39bd1](9b39bd1)) * **dot/network:** remove `defer cancel()` inside loop ([#2248](#2248)) ([9e360a5](9e360a5)) * **dot/network:** resize bytes slice buffer if needed ([#2291](#2291)) ([8db8b2a](8db8b2a)) * **dot/peerset:** fix sending on closed channel race condition when dropping peer ([#2573](#2573)) ([2fa5d8a](2fa5d8a)) * **dot/peerset:** remove race conditions from `peerset` package ([#2267](#2267)) ([df09d45](df09d45)) * **dot/rpc/modules:** grandpa.proveFinality update parameters, fix bug ([#2576](#2576)) ([e7749cf](e7749cf)) * **dot/rpc/modules:** rpc.state.queryStorage fixed ([#2565](#2565)) ([1ec0d47](1ec0d47)) * **dot/rpc:** include unsafe flags to be considered by RPC layer ([#2483](#2483)) ([3822257](3822257)) * **dot/state/epoch, lib/babe:** enable block production through epochs without rely on finalization ([#2593](#2593)) ([a0a1804](a0a1804)) * **dot/state:** actually prune finalized tries from memory ([#2196](#2196)) ([e4bc375](e4bc375)) * **dot/state:** change map of tries implementation to have working garbage collection ([#2206](#2206)) ([fada46b](fada46b)) * **dot/state:** inject mutex protected tries to states ([#2287](#2287)) ([67a9bbb](67a9bbb)) * **dot/subscription:** check websocket message from untrusted data ([#2527](#2527)) ([1f20d98](1f20d98)) * **dot/subscription:** unsafe type casting from untrusted input ([#2529](#2529)) ([1015733](1015733)) * **dot/sync, dot/rpc:** implement HighestBlock ([#2195](#2195)) ([f8d8657](f8d8657)) * **dot/sync:** cleanup logs; don't log case where we fail to get parent while processing ([#2188](#2188)) ([cb360ab](cb360ab)) * **dot/sync:** fix "block with unknown header is ready" error ([#2191](#2191)) ([483466f](483466f)) * **dot/sync:** fix `Test_lockQueue_threadSafety` ([#2605](#2605)) ([223cfbb](223cfbb)) * **dot/sync:** Fix flaky tests `Test_chainSync_logSyncSpeed` and `Test_chainSync_start` ([#2610](#2610)) ([7e1014b](7e1014b)) * **dot/sync:** Gossip `BlockAnnounceMessage` only after successfully imported ([#2885](#2885)) ([69031a6](69031a6)) * **dot/sync:** remove block announcement in `bootstrap` sync mode ([#2906](#2906)) ([2b4c257](2b4c257)) * **dot/sync:** remove max size limit from ascending block requests ([#2256](#2256)) ([e287d7e](e287d7e)) * **dot/sync:** sync benchmark ([#2234](#2234)) ([2f3aef8](2f3aef8)) * **dot/telemetry:** telemetry hashes to be in the hexadecimal format ([#2194](#2194)) ([9b48106](9b48106)) * **dot:** database close error checks ([#2948](#2948)) ([bdb0eea](bdb0eea)) * **dot:** no error logged for init check ([#2502](#2502)) ([2971325](2971325)) * ensure we convert the `uint` type ([#2626](#2626)) ([792e53f](792e53f)) * fix logger mutex locking in `.New` method ([#2114](#2114)) ([e7207ed](e7207ed)) * **internal/log:** log level `DoNotChange` ([#2672](#2672)) ([0008b59](0008b59)) * **levels-logged:** Fix log levels logging at start ([#2236](#2236)) ([a90a6e0](a90a6e0)) * **lib/babe:** check if authority index is in the `authorities` range ([#2601](#2601)) ([1072888](1072888)) * **lib/babe:** ensure the slot time is correct before build a block ([#2648](#2648)) ([78c03b6](78c03b6)) * **lib/babe:** epoch context error wrapping ([#2484](#2484)) ([c053dea](c053dea)) * **lib/babe:** Unrestricted Loop When Building Blocks (GSR-19) ([#2632](#2632)) ([139ad89](139ad89)) * **lib/blocktree:** reimplement `BestBlockHash` to take into account primary blocks in fork choice rule ([#2254](#2254)) ([1a368e2](1a368e2)) * **lib/grandpa:** avoid spamming round messages ([#2688](#2688)) ([b0042b8](b0042b8)) * **lib/grandpa:** capped number of tracked commit messages ([#2490](#2490)) ([47c23e6](47c23e6)) * **lib/grandpa:** capped number of tracked vote messages ([#2485](#2485)) ([d2ee47e](d2ee47e)), closes [#1531](#1531) * **lib/grandpa:** check equivocatory votes count ([#2497](#2497)) ([014629d](014629d)), closes [#2401](#2401) * **lib/grandpa:** Duplicate votes is GRANDPA are counted as equivocatory votes (GSR-11) ([#2624](#2624)) ([422e7b3](422e7b3)) * **lib/grandpa:** Storing Justification Allows Extra Bytes (GSR-13) ([#2618](#2618)) ([0fcde63](0fcde63)) * **lib/grandpa:** update grandpa protocol ID ([#2678](#2678)) ([3be75b2](3be75b2)) * **lib/grandpa:** various finality fixes, improves cross-client finality ([#2368](#2368)) ([c04d185](c04d185)) * **lib/grandpa:** verify equivocatory votes in grandpa justifications ([#2486](#2486)) ([368f8b6](368f8b6)) * **lib/runtime:** avoid caching version in runtime instance ([#2425](#2425)) ([7ab31f0](7ab31f0)) * **lib/runtime:** stub v0.9.17 host API functions ([#2420](#2420)) ([6a7b223](6a7b223)) * **lib/trie:** `handleDeletion` generation propagation ([24c303d](24c303d)) * **lib/trie:** `PopulateMerkleValues` functionality changes and fixes ([#2871](#2871)) ([7131290](7131290)) * **lib/trie:** Check for root in EncodeAndHash ([#2359](#2359)) ([087db89](087db89)) * **lib/trie:** Make sure writing and reading a trie to disk gives the same trie and cover more store/load child trie related test cases ([#2302](#2302)) ([7cd4118](7cd4118)) * **lib/trie:** prepare trie nodes for mutation only when needed ([#2834](#2834)) ([26868df](26868df)) * **lib/trie:** remove map deletion at `loadProof` ([#2259](#2259)) ([fbd13d2](fbd13d2)) * **lint:** fix issues found by golangcilint 1.47.3 ([#2715](#2715)) ([5765e67](5765e67)) * **mocks:** add missing `//go:generate` for mocks ([#2273](#2273)) ([f4f7465](f4f7465)) * **pprof:** pprofserver flag changed to boolean ([#2205](#2205)) ([be00a69](be00a69)) * **staging:** revise datadog-agent start process ([#2935](#2935)) ([36ce37d](36ce37d)) * **state/epoch:** assign epoch 1 when block number is 0 ([#2592](#2592)) ([e5c8cf5](e5c8cf5)) * **state/grandpa:** track changes across forks ([#2519](#2519)) ([3ab76bc](3ab76bc)) * **tests:** `TestAuthorModule_HasSessionKeys_Integration` ([#2932](#2932)) ([8d809aa](8d809aa)) * **tests:** fix block body regex in `TestChainRPC` ([#2805](#2805)) ([b0680f8](b0680f8)) * **tests:** Fix RFC3339 regex for log unit tests ([9caea2a](9caea2a)) * **tests:** Fix wasmer flaky sorts ([#2643](#2643)) ([7eede9a](7eede9a)) * **tests:** handle node crash during waiting ([#2691](#2691)) ([843bd50](843bd50)) * **tests:** update block body regex in `TestChainRPC` ([#2674](#2674)) ([055e5c3](055e5c3)) * **trie:** decode inline child nodes ([#2369](#2369)) ([9efde47](9efde47)) * **trie:** descendants count for clear prefix ([#2606](#2606)) ([1826896](1826896)) * **trie:** disallow empty byte slice node values ([#2927](#2927)) ([d769d1c](d769d1c)) * **trie:** equality differentiate nil and empty storage values ([#2969](#2969)) ([72a08ec](72a08ec)) * **trie:** no in-memory caching of node encoding ([#2919](#2919)) ([856780b](856780b)) * **trie:** Panic when deleting nonexistent keys from trie (GSR-10) ([#2609](#2609)) ([7886318](7886318)) * **trie:** remove encoding buffers pool ([#2929](#2929)) ([f4074cc](f4074cc)) * **trie:** use cached Merkle values for root hash ([#2943](#2943)) ([ec2549a](ec2549a)) * **trie:** use direct Merkle value for database keys ([#2725](#2725)) ([1a3c3ae](1a3c3ae)) * upgrade auto-generated mocks ([#2910](#2910)) ([a2975a5](a2975a5)) * **wasmer:** error logs for signature verification ([#2752](#2752)) ([363c080](363c080)) * **wasmer:** fix flaky sort in `Test_ext_crypto_sr25519_public_keys_version_1` ([#2607](#2607)) ([c061b35](c061b35)) ### Features * **build:** add `github.com/breml/rootcerts` ([#2695](#2695)) ([c74a5b0](c74a5b0)) * **build:** binary built-in timezone data ([#2697](#2697)) ([fdd5bda](fdd5bda)) * **chain:** use always the raw genesis file ([#2775](#2775)) ([dd2fbc9](dd2fbc9)) * **ci:** update mockery from `2.10` to `2.14` ([#2642](#2642)) ([d2c42b8](d2c42b8)) * **cross-client:** create docker-compose.yml for local devnet ([#2282](#2282)) ([8abbd87](8abbd87)) * detect chain directory dynamically ([#2292](#2292)) ([85c466c](85c466c)) * **devnet:** add substrate docker images to dockerfile ([#2263](#2263)) ([b7b2a66](b7b2a66)) * **devnet:** continuous integration `gssmr` devnet on AWS ECS ([#2096](#2096)) ([d096d44](d096d44)) * **docker:** docker-compose.yml to run Gossamer, Prometheus and Grafana ([#2706](#2706)) ([c5dda51](c5dda51)) * **dot/network:** add mismatched genesis peer reporting ([#2265](#2265)) ([a1d7269](a1d7269)) * **dot/state:** `gossamer_storage_tries_cached_total` gauge metric ([#2272](#2272)) ([625cbcf](625cbcf)) * **e2e:** build Gossamer on any test run ([#2608](#2608)) ([f97e0ef](f97e0ef)) * **go:** upgrade Go from 1.17 to 1.18 ([#2379](#2379)) ([d85a1db](d85a1db)) * include nested varying data type on neighbor messages ([#2722](#2722)) ([426569a](426569a)) * **lib/babe:** implement secondary slot block production ([#2260](#2260)) ([fcb81a3](fcb81a3)) * **lib/runtime:** support Substrate WASM compression ([#2213](#2213)) ([fd60061](fd60061)) * **lib/trie:** atomic tracked merkle values ([#2876](#2876)) ([1c4174c](1c4174c)) * **lib/trie:** clear fields when node is dirty ([#2297](#2297)) ([1162828](1162828)) * **lib/trie:** only copy nodes when mutation is certain ([#2352](#2352)) ([86624cf](86624cf)) * **lib/trie:** opportunistic parallel hashing ([#2081](#2081)) ([790dfb5](790dfb5)) * **metrics:** replace metrics port with address (breaking change) ([#2382](#2382)) ([d2ec68d](d2ec68d)) * **pkg/scale:** add `Encoder` with `Encode` method ([#2741](#2741)) ([af5c63f](af5c63f)) * **pkg/scale:** add use of pkg/error Wrap for error handling ([#2708](#2708)) ([08c4281](08c4281)) * **pkg/scale:** encoding and decoding of maps in scale ([#2894](#2894)) ([405db51](405db51)), closes [#2796](#2796) * **pkg/scale:** support for custom `VaryingDataType` types ([#2612](#2612)) ([914a747](914a747)) * remove uneeded runtime prefix logs ([#2110](#2110)) ([8bd05d1](8bd05d1)) * remove unused code ([#2677](#2677)) ([b3698d7](b3698d7)) * **scale:** add range checks to decodeUint function ([#2683](#2683)) ([ac700f8](ac700f8)) * **trie:** decode all inlined node variants ([#2611](#2611)) ([b09eb07](b09eb07)) * **trie:** export `LoadFromProof` ([#2455](#2455)) ([0b4f33d](0b4f33d)) * **trie:** faster header decoding ([#2649](#2649)) ([d9460e3](d9460e3)) * **trie:** finer deep copy of nodes ([#2384](#2384)) ([bd6d8e4](bd6d8e4)) * **trie:** tracking of number of descendant nodes for each node ([#2378](#2378)) ([dfcdd3c](dfcdd3c)) * **trie:** use scale encoder ([#2930](#2930)) ([e3dc108](e3dc108)) * **wasmer/crypto:** move sig verifier to crypto pkg ([#2057](#2057)) ([dc8bbef](dc8bbef)) * **wasmer:** Add `SetTestVersion` method to `Config` struct ([#2823](#2823)) ([e5c9336](e5c9336)) * **wasmer:** get and cache state version in instance context ([#2747](#2747)) ([3fd63db](3fd63db))
Changes
peerset
packageTests
Issues
PeersState.lastConnectedAndDiscovered
#2266t.Parallel
to all tests #2105Primary Reviewer